DARL: Distance-Aware Uncertainty Estimation for Offline Reinforcement Learning

نویسندگان

چکیده

To facilitate offline reinforcement learning, uncertainty estimation is commonly used to detect out-of-distribution data. By inspecting, we show that current explicit estimators such as Monte Carlo Dropout and model ensemble are not competent provide trustworthy in learning. Accordingly, propose a non-parametric distance-aware estimator which sensitive the change input space for Based on our new estimator, adaptive truncated quantile critics proposed underestimate samples. We able offer better compared previous methods. Experimental results demonstrate DARL method competitive state-of-the-art methods evaluation tasks.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Uncertainty-Aware Reinforcement Learning for Collision Avoidance

Reinforcement learning can enable complex, adaptive behavior to be learned automatically for autonomous robotic platforms. However, practical deployment of reinforcement learning methods must contend with the fact that the training process itself can be unsafe for the robot. In this paper, we consider the specific case of a mobile robot learning to navigate an a priori unknown environment while...

متن کامل

Direct Uncertainty Estimation in Reinforcement Learning

Optimal probabilistic approach in reinforcement learning is computationally infeasible. Its simplification consisting in neglecting difference between true environment and its model estimated using limited number of observations causes exploration vs exploitation problem. Uncertainty can be expressed in terms of a probability distribution over the space of environment models, and this uncertain...

متن کامل

Offline Evaluation of Online Reinforcement Learning Algorithms

In many real-world reinforcement learning problems, we have access to an existing dataset and would like to use it to evaluate various learning approaches. Typically, one would prefer not to deploy a fixed policy, but rather an algorithm that learns to improve its behavior as it gains more experience. Therefore, we seek to evaluate how a proposed algorithm learns in our environment, meaning we ...

متن کامل

Distance-Aware Beamforming for Multiuser Secure Communication Systems

Typical cryptography schemes are not well suited for low complexity types of equipment, e.g., Internet of things (IoT) devices, as they may need high power or impose high computational complexity on the device. Physical (PHY) layer security techniques such as beamforming (in multiple antennas systems)  are possible alternatives to provide security for such applications. In this paper, we consid...

متن کامل

Value-Aware Loss Function for Model Learning in Reinforcement Learning

We consider the problem of estimating the transition probability kernel to be used by a model-based reinforcement learning (RL) algorithm. We argue that estimating a generative model that minimizes a probabilistic loss, such as the log-loss, might be an overkill because such a probabilistic loss does not take into account the underlying structure of the decision problem and the RL algorithm tha...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Proceedings of the ... AAAI Conference on Artificial Intelligence

سال: 2023

ISSN: ['2159-5399', '2374-3468']

DOI: https://doi.org/10.1609/aaai.v37i9.26327